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ABSTRACT 

The supersecondary structure of amyloids and prions, proteins 
of intense clinical and biological interest, are difficult to deter- 
mine by standard experimental or computational means. In 
addition, significant conformational heterogeneity is known or 
suspected to exist in many amyloid fibrils. Previous work has 
demonstrated that probability-based prediction of discrete 
/J-strand pairs can offer insight into these structures. Here, we 
devise a system of energetic rules that can be used to dynami- 
cally assemble these discrete /?-strand pairs into complete amy- 
loid ^-structures. The STITCHER algorithm progressively 
'stitches' strand-pairs into full ^-sheets based on a novel free- 
energy model, incorporating experimentally observed amino- 
acid side-chain stacking contributions, entropic estimates, and 
steric restrictions for amyloidal parallel /?-sheet construction. A 
dynamic program computes the top 50 structures and returns 
both the highest scoring structure and a consensus structure 
taken by polling this list for common discrete elements. Putative 
structural heterogeneity can be inferred from sequence regions 
that compose poorly. Predictions show agreement with experi- 
mental models of Alzheimer's amyloid beta peptide and the 
Podospora anserina Het-s prion. Predictions of the HET-s homo- 
log HET-S also reflect experimental observations of poor amy- 
loid formation. We put forward predicted structures for the yeast 
prion Sup35, suggesting N-terminal structural stability enabled 
by tyrosine ladders, and C-terminal heterogeneity. Predictions 
for the Rnql prion and alpha-synuclein are also given, identify- 
ing a similar mix of homogenous and heterogeneous secondary 
structure elements. STITCHER provides novel insight into the 
energetic basis of amyloid structure, provides accurate structure 
predictions, and can help guide future experimental studies. 
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INTRODUCTION 

Amyloid is a highly-ordered cross (3-protein aggre- 
gate that can be achieved by a very broad set of proteins 
with widely divergent and unrelated amino acid sequen- 
ces. 1>2 Given the right conditions, a great many, perhaps 
most, proteins have the potential to form amyloids. The 
tendency towards amyloid appears to be because of 
intrinsic properties of the peptide backbone, a finding 
of great importance for understanding the evolution of 
protein folds. A much smaller fraction of proteins and 
protein fragments, assemble into amyloid under normal 
physiological conditions, and these are of great interest 
in diverse aspects of biology and medicine. 

Early amyloid research concentrated on amyloids 
associated with a wide variety of mammalian diseases, 
from systemic immunoglobulin amyloidoses to neuro- 
degenerative diseases such as Alzheimer's.^ Initial 
assumptions that accumulated amyloid always caused 
cellular and tissue toxicity, as is still believed to take 
place in peripheral amyloidoses, ^ proved to be 
unfounded upon the discovery of a wider variety of 
amyloids. Amyloids are now known to play roles in 
bacterial biofilms,^ the production of melanin,^ the 
storage of hormones in secretory granules,^ and neuro- 
nal learning and memory. 9 A set of self-templating 
fungal amyloids additionally give rise to epigenetic her- 
itable traits. These bistable "prion" proteins can persist 
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as soluble or amyloid species with different functional 
activities. The self-templating property causes cell-wide 
persistence of one or the other stable state, a status 
passed from generation to generation via cytoplasmic 
transfer of amyloid templates from mother to daughter 
cells.10'11 Increasingly, evidence suggests that the forma- 
tion of amyloids may more commonly be a protective 
mechanism, which especially in the case of the neurode- 
generative amyloidoses, acts as to sequester misfolded 
polypeptides that would otherwise dwell in more toxic, 
and more highly interactive, oligomeric species. There- 
fore, there is great interest in deciphering the structures 
that underlie amyloid states. 

While the secondary structure of amyloid is known to be 
highly |3-rich,12 - ±" experimental structural determination 
has proven highly difficult, with only extremely short seg- 
ments crystallized 18 and a very few successful solid-state 
nuclear magnetic resonance (ssNMR) studies. 19-23 
Because of the scarcity of direct evidence, the nature of 
amyloid and prion supersecondary structures, and their 
relation to sequence have been highly contentious 
topics. 17,24,25 -pjjg p ar allel (3 -helices form a fold widely 
cited as one-potential model for amyloid,26-28 wn ile 
others favor a 'superpleated (3-sheet. Complications 
include the morphological heterogeneity of amyloid struc- 
tures suggested by EM imagery-^7,32 anc j tne demonstra- 
tion of prion 'strains' or 'variants' with differing growth 
and stability phenotypes.-"-35 In the case of the yeast 
prion protein Sup35, such variants have been demonstrated 
to maintain specificity through serial passage^ and have 
been correlated with differences in conformation.^" 

The bistable nature of amyloid prions, as well as the 
observation of heterogeneity and 'strains' in amyloid and 
prion folding, undermines the canonical viewpoint of 
'one-protein, one-fold' long held by theorists of protein 
folding. Instead, a murky view arises of a set of stable 
valleys in a field of conformational configurations, within 
which variations are permitted around common or simi- 
lar folding patterns. Enzymologists have long studied the 
variations in globular protein conformations caused by 
ligand binding, catalytic activity, presence of ions or 
cofactors. Amyloids embody a similar but larger set of 
variations. 

Bryan et al?^ and others have proposed that (3-strand 
pairs form the core energetic subunits that make up amy- 
loid structure, and a (3-strand predictor was designed 
around this named BETASCAN. BETASCAN calculates 
likelihood scores for potential (3-strands and strand-pairs 
based on correlations observed in parallel (3-sheets. A key 
and novel feature of BETASCAN was a maxima-finding 
algorithm that searched the strands and pairs with the 
greatest local likelihood for all of the sequence's potential 
(3-structures. While sufficient to predict sequence regions 
with high potential for amyloidogenic (3-structure, 
BETASCAN did not suggest the most likely overall 
(3-sheet fold. For example, BETASCAN was unable to 



distinguish between the highly similar amyloidal HET-s 
allele and nonamyloidal HET-S allele in Podospora anser- 
ine. 

The STITCHER method described in this article 
extends prediction of amyloid-like proteins by employing 
a combination of probabilistic prediction^" and free- 
energy^ methods for protein structural prediction. 
Since, few atomic-detail templates exist from known 
structures, the algorithm proceeds via a dynamic assem- 
bly of |3-strands in agreement with the twists and turns 
necessary to form a (3-helix or superpleated-sheet fold. 
This philosophy of establishing and then manipulating 
predefined structural components has been previously 
used successfully.'!^'^ 1 The score of each successive (3- 
strand addition is determined through a combination of 
novel free-energy model and BETASCAN-derived proba- 
bilities. The free-energy methods account for the en- 
thalpy of created hydrogen bonds and the entropy of 
linkers, while the probabilities describing the likelihood 
of (3-sheet formation account for the specific side-chain- 
side-chain interactions that confer structural stability. Of 
particular importance to our energetic model is the 
detection of stacking ladders, formed by the side-chain- 
side-chain stacking and bonding of glutamine, aspara- 
gine, tyrosine, and phenylalanine residues. 42-44 t; 0 cap- 
ture the observed structural heterogeneity of amyloids 
(e.g., the "strain" phenomenon), STITCHER calculates a 
list of the top-scoring M = 50 structures instead of just 
a single optimal. From this set of high-scoring candi- 
dates, a consensus structure is derived to represent the 
commonalities in specific strand-pairs among these 50 
structures. Specifically, portions of the structure are con- 
sidered more likely to form if they are seen in 80% or 
more of the top structures. 

In our results, we show that the STITCHER method 
can be used to accurately reconstruct structure, as is 
given by the example of the well-studied Alzheimer's 
amyloid beta-peptide and the Podospora anserina Het-s 
prion. STITCHER is also shown to be less prone to false 
positives than the prior BETASCAN program as it distin- 
guishes the amyloid forming HET-s protein from its 
close, nonamyloidal homolog, HET-S. Novel structural 
predictions are then analyzed for the prion domain of 
the yeast protein Sup35 as well as the Rnql prion and 
alpha-synuclein. The STITCHER algorithm may be 
accessed at http://stitcher.csail.mit.edu. 

MATERIALS AND METHODS 

Algorithmic strategy 

The greatest problem in protein structure prediction is 
the reduction of possible conformation patterns from a 
mind-boggling potential space to the few viable and sta- 
ble conformations. In the present case, the restriction to 
parallel, all (3-structures massively reduces the conforma- 
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tional space and simplifies the prediction problem. The 
maxima-finding algorithm of BETASCAN further 
reduced the space of solutions to only discrete secondary 
structural elements, identifying locally probable strands 
and strand-pairs. While this level of detail is useful in 
some cases, supersecondary, and tertiary structural infor- 
mation has been completely lost. The goal of STITCHER 
is to build upon the successful probability models of 
BETASCAN to then reconstruct supersecondary struc- 
tures based on a minimal set of additional energetic fea- 
tures and constraints. While complex hydrophobic and 
kinetic interactions are known to affect the rate of in 
vivo supersecondary structure formation, STITCHER 
relies on this minimal set of energy parameters to make 
the simplest possible model of thermodynamically possi- 
ble stable structures that could form. Specifically, 
STITCHER identifies the importance of stacking pairs 
and entropic penalties for (3-strand linkers. Further, 
(3-strands are assembled into full structures using param- 
eters that restrict the length of linkers and the distance 
between paired strands, parameterized by an interpreta- 
tion of tertiary (3-helix or superpleated- sheet fold. A 
dynamic program is used to combine ("stitch") BETAS- 
CAN strand-pairs using these energetic features and 
physical constraints, and outputs a list of the top 50 scor- 
ing assemblies. Dynamic programs utilize the fact that 
the calculation of scoring assemblies will reuse many 
smaller calculations. A piece of pseudocode may demon- 
strate how STITCHER uses the principle: 

Calculate_beta_sheet_score 
(rung number) : 

If lookuptable (beta_sheet_score 
(rung number - 1) ) exists: 
All but last rung score = 

lookuptable (beta_sheet_score 
(rung number - 1) ) 

Else 

If rung number = 0: 

All but last rung score = 

Calculate_rung_score (first_ 

rung) 
Else : 

All but last rung score = 
Calculate_beta_sheet_score 
( rung number) 
End_if 

Lookuptable (beta_sheet_score 
(rung number - 1) ) = 

All but last rung score 
End if 

Calculate rung_score (last_rung) 
Beta sheet score = rung score + 
All but last rung score 
Return Beta_sheet score 
End Calculate beta sheet score 



In the pseudocode above, we show a simplified version 
of a portion of our algorithm calculating a score for a 
partial structure. First, the algorithm checks if a smaller 
piece of the structure — namely, all but the last rung — has 
already been scored. If so, it reuses that score without the 
need to repeat a calculation. If it is not scored, it calcu- 
lates that smaller partial structure's score and stores it. 
Crucially, for any structure larger than one rung, the 
algorithm recurses, storing the smaller structures' scores 
along the way. It then scores the additional rung and 
stores that score as well. By using a recursive algorithm 
to make calculations only as needed and storing results 
for reuse, any partial structure need only be calculated 
once, saving greatly on calculation time and resources. 
When no more additional structure is available, the com- 
pleted structure can then be considered for possible out- 
put depending on its score. Once a list of top candidate 
structures is assembled, we use polling to assess their 
agreement or disagreement for specific strand-pairs. 

Fold constraints and parameters 

STITCHER constrains the assembly of strand-pairs to 
a limited space of amyloid-like parallel (3-sheets. Follow- 
ing the conventions of previous authors26>37,45,46 and 
the evidence from known amyloid models,20~22,29,47 
we define an arrangement of n possible sheets, discretized 
by rungs. Each rung contains n strands, each contribut- 
ing to a sheet. For a structure of m rungs there will be 
mn strands and mn — 1 linkers connecting strands to 
each other. Every rung is connected by strand-pairs of 
length L, identified by residues i and j stacked atop one 
another {i,j > i, L > 1}. Therefore each structure con- 
tains (m — l)n strand-pairs in each putative structure. A 
complete amyloid protofibril is modeled as many copies 
of such structures. 

Bounds on the parameters m and n further reduce the 
number of choices to be made in selecting plausible 
structures. The two models of amyloids in the literature 
may be described by constraints on parameters. The (3- 
helix fold requires m > 1, n > 2l9>21,48. ne arly all 
observed cases in nature, including solved structures of 
amyloids, are either n = 2 or n = 3. A simple model of 
a superpleated |3-sheet is formed by parameters m = 1, n 
> 2. In this model every copy of the amyloid protein 
forms a single rung of n (3-strands. In the case of m = 1, 
therefore, every strand-pair consists of two identical cop- 
ies of a strand, and i = j for all strand-pairs. Therefore, 
two possible sets of parameters were considered for anal- 
ysis: n = {1,2,3}, m > 1 and m = 1, n > 1, i = 

Finally, we restrict the space of potential strand-pair com- 
binations such that m > 1 amyloids must loop back to a 
location near their starting point at the end of each rung. 
Therefore, the distance from the end of the first strand of 
any strand-pair to the start of the second must be longer 
than the strand itself: (j — i) >2L. Likewise, strand-strand 
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linkers can be no shorter than three residues long — the 
tightest turn observed in crystal and NMR structures. 

Free-energy scoring function 

Amyloid structures are scored using a formula that 
includes the BETASCAN scores of their strand-pairs, as 
well as bonuses reflecting the stabilizing influence of non- 
backbone hydrogen bonding, and the entropic cost of 
restricting backbone movement into the loops of strands. 
The weight of each component in the scoring function 
was determined by summing estimates of their free- 
energy contributions to stability. Following the pattern of 
Zhang et al.,^ the Gibbs free energy delta G of the 
change from unfolded to folded state is 

AG = A£ c + AE cl - TASH ot 

where AE C is the contact enthalpy of placing residues to- 
gether, Afi el is the electrostatic energy associated with ionic 
interactions, and TAS^ ot is the entropy of folding. Amy- 
loids are typically sparse in charged amino acids, and the 
stronger partial dipoles associated with other amino acids 
are usually incorporated into hydrogen bonding. In this 
analysis, the electrostatic interaction is therefore only 
considered with reference to hydrogen bonds, and A_E e i is 
set to zero. Contact energies can be further decomposed 
into energies contributed by backbone-backbone, back- 
bone-side-chain, and side-chain-side-chain interactions: 



AG = A£ bb _ 



bb 



A£ b b-sc + A£ st 



TASP b ot 



Role of BETASCAN scores 



We model amyloid peptide backbones to contain only 
(3-strands, with a linear arrangement constrained by the 
hydrogen bonds to other strands and linkers, which are 
only constrained by the strands at their beginning and 
end. There is one hydrogen bond per residue in the length 
of each strand-pair, and an equivalent entropy loss for the 
constraint it imposes on backbone flexibility. Therefore, 
we separate the entropy into linker and strand terms, and 
rearrange to express them with reference to length: 

AG = (A£ bb - TAS S „) + A£ bb _ sc + A£ sc _ sc - TAS lmk 

AG = (A^H-bond — TASbb-rcsJistr. + A£bb-s< 



Side- chain/backbone interactions are a primary deter- 
minant of p-sheet propensities, both through van der 
Waals interactions-^ and entropy of solvation. 51 These 
factors explain much of the known relative affinities of 
amino acids,^^ although these propensities must be 
interpreted in context.53 The BETASCAN algorithm uses 
these propensities to estimate relative probabilities of for- 
mation of a strand-pair, normalized by length to allow 



comparison of strands with different lengths. This log- 
odds estimate of the relative probability of a (3-strand 
conformation is applied to AGp f orm . The energies from 
the direct hydrogen bonds made by all residues in the 
strand may be combined with the side-chain effects and 
estimated by the entire strand score: 

AG- [( A£h 

—bond H" A-Z?bb-res TASbb-res.)^str.] 



strands 



AG= [ Score < 

strands 



res. 

Pform /^tr. 



A£ sr _ sr — TAS un k 

T A Slink 



-^sc— sc 

AF 



Side-chain stability bonuses 

Side- chain/side- chain energies include hydrogen bond- 
ing in the case of asparagine and glutamine stacking,^ 
pi-bond orbital stacking in the case of tyrosine and phe- 
nylalanine,-^ and van der Waals interactions between 
side-chains. The first two contribute bonus stability 
beyond that calculated by BETASCAN. The last has been 
shown to be very small (less than 0.20 kcal/moi^) and 
can be disregarded. We therefore set: 

^ (« XX )A£ XX 

—stack 

X={Q,N,Y,F} 



Entropic penalties 

Finally, we consider the entropy of the linkers, defined 
as free loops of peptide between |3 -strands. We note, but 
do not include here, the difficult problem posed by the 
paradoxical contributions of polyglutamines^S to the en- 
tropy of (3-structures such as the huntingtin fibril. The 
problem of calculating the linker entropy is otherwise a 
subset of the general problem of polymer condensation 
entropy-^ and bears remarkable similarity to that of disul- 
fide bond entropy. ^ The entropy may be calculated as 



AS= -R In 



\(27l4a i-link) 



3/2 



Vends 



where R is the gas constant, Z aa the length from ct-carbon 
to ct-carbon, 3.8 A, L uruc is the number of residues in the 
linker, and v en d s is the volume the ends of the linker may 
occupy. A hydrogen bond is approximately the same 
length as the distance between sulfide groups in a disul- 
fide bond, namely 4.8 A. The entropy calculation, using 
these values, may be simplified to 

^ . „ kcal 3 , 

TAS= -2.1 -Rln I link 

mol res. 2 

for T Ri 300 K. Because we are comparing structures 
known to have linkers, we disregard the constant term 
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and make an estimate yielding the relative entropy of a 
linker, 



) k , Cal ] i In,,], 



101 



This formula is used in two different ways: to calculate 
the entropic penalty of adding (3-strands extending a 
sheet and to calculate the entropic penalties accrued 
from combining multiple sheets into the fiber. The differ- 
ence is in the value of L\ ink . For the former, we assess the 
entropy of forming a loop from a free polypeptide-chain 
without regard to strand-pairs (as the strand-pairs cannot 
form until the chain is in proximity to itself). In this 
case L^. 8 = — i\ is the difference between the N-ter- 
mini of the two strands. For linkers between two separate 
(3-sheets / and g, the length of the linker is the number 
of residues between strand-pairs, counting both the 
upper- and lower-chains of the pair: 



L f ' g 

4ink 



(if + if)+ig-(if + if) 



The form of the scoring function for STITCHER may 
now be fully described as 



AG= ^ ^Score^AG^ rn ; s jl str .J + ^ («xx)A£ xx _ st ack 

strands X={Q,N,Y,F} 



E 0.9 



kcal 
mol res 



In 47 "E 0-9 



links 



kcal 
mol res 



In!, 1 



Evaluation of score 

The STITCHER algorithm uses a dynamic program- 
ming algorithm to evaluate estimated AG for the combi- 
nations of strand-pairs that can be combined into tem- 
plates matching the parameters described earlier. To do 
so, the calculation of AG is subdivided into calculation 
by rungs. The total free-energy change can then be calcu- 
lated by summing the stability contribution of any rung r 



containing strands {r 1 . . .r n } and linkers {Yj 
with a linker to the previous rung, as 



1,2 „("-!)- 
link" "'link 



AG= J2 ^ 

r={l-m] 

AG r = £ Score AG^T L 1+ £ 

—stack 

str={ri-r„} X={Q,N : Y,F} 



E 0.9 



*={l-n} 



kcal 



mol res 



In K 



■(g,i)-g 

link 



If the stacking bonuses, a directly sequence-dependent 
calculation, are considered apart from the strand and 
linker scores, which are only indirectly sequence-depend- 
ent, then the rung calculation can be partially separated 
into strand calculations: 



AG r = Yl 

g={l-n} 

kcal 



Score (AG^)I rr 



+ E (4*) A£ ; 

X={Q.N,Y,F} 



'XX —stack 



Energy weights 

To calculate this function, we must estimate 
A£xx-stack- Experimental data49-51,53 suggests the free- 
energy of (3-strand formation per residue to be ~ 1 kcal/mol 
res, a combination of the enthalpy of the hydrogen bond 
and the entropy of solvation as influenced by side-chains. 
This is a somewhat rough estimate because of context- 
dependency.-^ For the bonuses and penalties, we assess 
the contribution of additional hydrogen bonds to the 
free-energy. The free-energy of the hydrogen bond is 
again offset to some degree by solvation, though not as 
strongly as for the backbone. The rough bonus weights 

A£QQ_ sr ack ~ 1 mofres. ' A^NN-stack ~ ^ ^Tres. ' ^-EyY-stack 

~ ^ mo? res were use£ l f° r this study. The extra weighting 
of NN over QQ is justified in two ways. First, asparagine 
(N) has a shorter distance from backbone to amide than 
does glutamine (Q). Additionally, experimental data^3 
suggests that in at least one prion, replacing all gluta- 
mines with asparagines provides better stability of the 
prion-fold as compared to replacing all asparagines with 
glutamines. These estimated energy weightings may 
become more accurate as calorimetry of side- chain-side- 
chain interactions becomes available. 



By calculating the free-energy scores of strand-pairs 
and linkers as subproblems of rung scoring, and rung 
scores as subproblems of structure assembly, the dynamic 
programming method can be used to iteratively calculate 
the M structures with the highest score by tracing back 
through internally consistent partial structures that do 
not violate the defined fold constraints. 

Evaluating consensus outputs 

The composition of the M highest structures (with 
default M of 50) is assessed by scanning over all struc- 
tures for included strand-pairs by the locations (;', j) of 
their termini. If the number of strand-pairs in the M 
highest structures with N-termini of (i ± 2, j ± 2) total 
more than 80% of M, the location is noted as a consen- 
sus structure element, and the ranges of z', j, L and 
strand-pair score over the strand-pairs in the (i ± 2, j ± 
2) region are output. For display purposes, the strand- 
pairs with matching lower and upper strands are aligned 
vertically to reconstruct predicted (3-sheets. The output 
of STITCHER includes the set of M predicted structures, 
a diagram of local structure space, and the top scoring 
consensus structure. 
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Figure 1 

STITCHER results for A3 (amyloid beta) (m = 1). At left, a contact 
map of the 50 highest-scoring folds. The horizontal and vertical axes 
indicate, respectively, the residue numbers (counted from the N- 
terminus) of the lower and upper strands in each strand-pair of the 
structures. Starting locations of strand-pairs are indicated by circles, 
with size of circle (small to large) and color of circle (magenta to black, 
right-hand color spectrum) indicating the strength of the vote of the 
top 50 structures for that strand-pair. The strand-pairs are drawn along 
their length in shades of orange, with stability increasing from yellow to 
red (left-hand color spectrum). Fold structures are indicated by the 
dotted lines connecting strand-pairs into rungs and sheets. Structure 
scores are indicated by shades with stability increasing from blue to 
green (center color spectrum). At right, the highest-scoring fold. Each 
strand-pair is denoted by its score (Sc) and its length (L). To the extent 
possible, rung-pairs proceed from left to right and sheets from top to 
bottom. Numbers to the left of the strands indicate the number of the 
residue immediately preceding the N-terminus of the strand. Slanted 
lines indicate the first residue of the strand, arrowheads the last residue 
of the strand, and connecting line(s) indicate the possible residue- 
pairing(s) of the first residues of the strands. 



RESULTS AND DISCUSSION 

Amyloid -beta 

STITCHER was tested on amyloid beta, an amyloid 
with two experimental NMR models,-^'^ allowing both 
superpleated sheet [m = 1, i = j) and |3-helix (m > 1) 
structures (see symbols used for definitions of all varia- 
bles). The results are shown in Figure 1. In the case of 
amyloid beta, the highest-scoring structures all incorpo- 
rate at least one i = j strand-pair (the 10-residue 
(3-strand from isoleucine 31 to valine 40 inclusive). Sev- 
eral of the highest-scoring structures include strands ana- 
logs to those in solved NMR structures, including strands 
beginning at tyrosine 10 (corresponding to^) and at leu- 



cine 17 (corresponding to 20). However, the highest- scor- 
ing structures include a first strand with one residue 
shifted from a perfect in-register parallel alignment, in 
the region between histidine 13 and alanine 21 (sequence 
HHQKLVFFA). This region is known to exhibit struc- 
tural variability, especially at differing pH.^8 

Het-s/S 

A key goal in amyloid folding studies is to distinguish 
amyloidogenic from nonamyloidogenic sequences. 
STITCHER was run on the small-s and big-S variants of 
the Het-s mating compatibility factor to test its ability to 
make this classification. The small-s allele of this protein 
is known to form an amyloidal prion.21>22 In contrast, 
the big-S allele does not form an amyloid structure. 

The results for Het-s and Het-S are displayed as Figure 
2(A,B), respectively. Immediately, evident is the greater 
"connectivity" of Het-s predictions (i.e., the high number 
of valid sequential orderings of (3-strand pairs). On the 
other hand, very few of the predicted Het-S strand-pairs 
are able to form multiple-sheet structures. Furthermore, 
the single HET-s strand-pair most often seen in the list of 
50 top structures (identified by N-terminal residue-pair 
isoleucine 14-threonine 49) overlap with the conformation 
of strands lb, 2a, 3b, and 4a in the most recently solved 
NMR structure,-^ although off by one in pairing registry. 

Sup35p: case study in three species 

We used STITCHER to computationally investigate the 
amyloidogenic impact of minor sequence differences in 
homologs proteins. Three homologs of the yeast prion 
Sup35p were chosen, taken from Candida albicans, Sac- 
charomyces cerevisiae, and Yarrowia lipolytica. The out- 
puts of these predictions are shown in Figure 3. 

The top 50 structural predictions for the C. albicans 
sequence suggest only a single possible folding route [Fig. 
3(A)], as all top structures contain recurrent (3-strand 
pairs. The consensus structure thus stretches nearly the 
entire sequence. Further, the probabilistic signal from 
each structural strand-pair is fairly weak, with most 
strands consisting of only three or fewer residues. It 
appears that most of the amyloidogenic potential 
detected in this sequence comes from its long polyglut- 
amine stretches. STITCHER accounts for these via loop- 
based stacking bonuses that stabilize (3-sheet structure. 
Specifically, tyrosine and phenylalanine ladders serve to 
align the structure, in agreement with some mutagenic 
experiments. 

The top 50 structural predictions for Sup35p S. cerevi- 
siae [Fig. 3(B)] and Y. lipolytica [Fig. 3(C)] are more het- 
erogeneous. Much of this heterogeneity is due to the 
recurrent repeats in the Sup35p sequence. For instance, 
the strand beginning at glycine 44 can favorably be 
paired with the repeats at glycines 68, 77, and 97. Three 
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Figure 2 

STITCHER results (m = 2) for the two alleles of the Podospora anserina mating compatibility protein: (A) Het-s and (B) Het-S. Contact maps, at 
left, and top-scoring structures, at right, are as described in the caption to Figure 1. 



pairs in the S. cerevisiae structure, identified by their N- 
terminal residues as asparagine 12-glutamine 38, tyrosine 
29-tyrosine 49, and glutamine 38-glutamine 62, are 
highly recurrent. Interestingly, these pairs correspond to 
the "head" area identified in previous experimental stud- 
ies,^ 6 where mutations are known to have large effects 
on amyloidogenic potential. 59 J_ n contrast, predictions 
for the region beyond residue 91 shows increased var- 
iance. This "tail" region also exhibits experimental vari- 
ability in different protein "strains."^ 6 The Y. lipolytica 
structure [Fig. 3(C)] shows similar traits, but with fewer 
recurrent (3-structures most likely as a result of the irreg- 
ularity in sequence repeats. 



The effects of ladders and residue-repeats in 
Alpha-synuclein and Rnqlp 

To further study the effect of residue side-chain 
stacking, STITCHER was also used predict the struc- 
tures of two other important amyloidogenic proteins: 
human alpha-synuclein and the S. cerevisiae prion 
Rnqlp (Fig. 4). Although, a putative structural model 
has been published for alpha-synuclein, 61 ^ no structure 
has been proposed for Rnqlp. However, it has been 
shown that Rnqlp may facilitate templating in Sup35p 
fibers. 61 ' 62 

The top 50 structure predictions for Rnqlp [Fig. 4(A)] 
follow a similar pattern as in C. albicans Sup35p, identi- 
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Figure 3 

STITCHER results for Sup35p (m = 3) in three species. (A) C. albicans, (B) S. cerevisiae, and (C) Y. lipolytica. At left, a contact map of the 50 
highest-scoring folds. At right, the top-scoring structure. For S. cerevisiae, the highest-scoring structures for n = 2 and n = 3 are presented. For 
description of colors, numbers, and lines, see the caption to Figure 1. 



fying many recurrent (3-structures. Although, nonspecific 
loop-based side-chain ladders of glutamines contribute 
the most to the energetic score of these structures, 



shorter strands including glutamine, phenylalanine, tyro- 
sine, and asparagine ladders also contribute to the identi- 
fication of specific (3-strand pairs. (3-structure content is 
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Figure 4 

STITCHER results for (A) S. cerevisiae Rnqlp (m = 2) and (B) ct-synuclein (m = 2). At left, a contact map of the 50 highest-scoring folds. At 
right, a consensus structure assembled from all clusters of strand-pairs {(i 0 — 2) > i > (i 0 + 2), (j 0 — 2) > j > (; 0 + 2)} found in > 80% of the top 
50 highest-scoring folds. The range of strand positions and lengths is indicated by the shortest and longest possible strand arrows, drawn at the 
most N- and C-terminal possible locations. Connecting lines indicate the possible pairings of the first residues of the strands making up the strand- 
pair. The range of lengths and scores for a set of possible strand-pairs is indicated at L and Sc, respectively, for each set. The numbers to the left of 
the strands indicate the residue immediately before the leftmost possible residue in each strand (i.e., the residue before the first written above and 
below the strand arrows). For description of colors and numbers, see the caption to Figure 1. 



highest in the region of residues 53-175, albeit with large 
exterior loops exposing a high number of glutamine and 
asparagine residues. This suggests that polyasparagine 
and polyglutamine stretches limit highly specific (3-strand 
pairing. 

(3-helical STITCHER predictions for alpha-synuclein 
[Fig. 4(B)] output a strong signal for recurrent strand- 
pairs in the region roughly between residues 20 and 80. 
The highest-scoring structures are composed of two (3- 



sheets; the first with strands of length 3 and the second 
with strands of lengths 7-10. Interestingly, some strand 
pairing regions exhibit alternate registries in nearly the 
same locale and with nearly equal scores. As with amy- 
loid beta, this predicted variability is associated with the 
existence of repeated residues found near the edge of the 
strands (e.g., valines 15, 16, alanines 17-19; alanines 29, 
30; valines 48, 49, glycines 67, 68, valines 70, 71, alanines 
88-91, and lysines 96, 97). 
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CONCLUSION 

STITCHER introduces a novel energetic scoring model 
for amyloid fibrils and an efficient algorithm for dynami- 
cally assembling discrete (3-strand pairs into complete 
amyloid structures. The system of physical constraints 
used to "stitch" (3-strands into a complete structure 
offers an accurate generalization of successful template- 
based methods such as BETAWRAP and its succes- 
sors,^^ 1,52 which only conformed to rigid templates. 
In addition, STITCHER takes into account the unavoid- 
able uncertainty in free-energy parameters and the poten- 
tial heterogeneity of amyloid folds by computing multiple 
solutions instead a single optimal. Although, the highest- 
scoring fold frequently offers a good solution, the best 
results are achieved by assembling the top solution's most 
frequently occurring strand-pairs into consensus struc- 
tures. It should be noted that the particular fold taken by 
an amyloid is sensitive to environmental conditions, 
including pH, concentration of protein, and presence of 
chaperone proteins. Thus, the structure with the highest 
STITCHER score may not be that taken by the protein 
under experimental conditions. 

The results for alpha-synuclein, Rnqlp, and to some 
extent the various Sup35p proteins highlight the role of 
single-residue repeats, motif repeats, and sidechain stack- 
ing ladders in amyloid structure. Single-residue repeats 
may contribute to structural stability through stacking 
ladders, especially in the cases of polyglutamine and pol- 
yasparagine. However, this stability increase comes at the 
expense of (3-strand pairing specificity, as the importance 
of aligning any particular pair of residues is reduced. A 
similar but diminished effect is seen in repeats composed 
of multiple-residue motifs, such as in S. cerevisiae 
Sup35p. This suggests that the (3-strand pairing specific- 
ity for repeat-heavy amyloidogenic proteins may be con- 
veyed through two other features of a structure: short 
intervening linker loops, and the formation of strand- 
pairs with more rare stacking ladders such as histidine 
and phenylalanine. 

In the cases of known amyloid structures amyloid beta 
and HET-s, STITCHER is able to predict the core regions 
of (3-structure observed experimentally. Conversely, the 
predicted results for Het-S agree with the protein's 
observed nonamyloidogenic nature, despite a high (3-pro- 
pensity sequence that is nearly the same as the amyloido- 
genic HET-s prion. While a more robust analysis and 
verification would require additional amyloid structure 
determination, the STITCHER methodology appears to 
be a valuable addition to the growing number of amyloid 
detection algorithms. Further, as new experimental data 
provides better insight into the nature of (3-strand ener- 
getics, and new amyloid structures arise, the STITCHER 
algorithm could be readily extended. While some inter- 
pretation and experimental verification is necessary for a 
complete understanding of amyloid folding, the identifi- 



cation of the range of most likely folds should greatly 
enhance the further computational and experimental 
investigation of amyloid and prion proteins. 
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